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ABSTRACT 

A  theoretical  model  of  how  an  expert  programmer  goes 
about  understanding  a  piece  of  software  is  presented.  This 
understanding  plays  an  especially  critical  role  in  software 
maintenance  tasks.  The  trcdel  is  based  on  three  cognitive 
processes:  CHUNKING,  SLICING,  and  HYPOTHESIS  GENERATION  and 
VERIFICATION.  These  processes  are  used  in  conjunction  with 
a  programmer  's  Knowledge  base  and  categories  of  information 
critical  to  program  understanding  are  identified.  The  model 
also  tafces  advantage  of  certain  characteristics  of  an 
associative  memory  to  describe,  using  a  semantic  net 
representation,  the  mechanisms  behind  these  processes  and 
the  organization  of  memory  resulting  from  their  use.  The 
benefits  of  documentation  ana  the  use  of  commenting  and 
rcerronics  are  described  in  terms  of  the  rrodel  and  ray  be 
useful  as  a  gtide  for  incorporating  these  into  the  code. 
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I.   INTRODUCTION 


Software  maintenance  now  accounts  lor  a  large  percentage 
cf  any  software  system's  life-cycle  cost.  In  view  of  this, 
tbe  software  industry  has  shitted  its  emphasis  with  respect 
tc  program  evaluation.  No  longer  is  software  being  Judged 
solely  on  the  merits  of  its  appllcaclli ty  to  a  given 
problem.  lnhile  net  neglecting  the  importance  of  this,  the 
industry  is  considering  factors  which  affect  software 
rraintenance  as  well.  Cne  such  factor  is  software 
understandatility  iRef.  1] . 

Gaining  an  understanding  cf  unfamiliar  programs  is 
frequently  cited  ty  researchers  as  the  first  and  often  most 
costly  step  in  software  maintenance.  This  understanding  is 
achieved  when  the  programmer  has  'learned'  all  that  is 
necessary  tc  competently  carry  out  the  required  maintenance 
task.  taking  software  easier  to  understand  would  have 
significant  long  term  advantages  resulting  in  reduced  life- 
cycle  costs.  This  study  presents  a  theoretical  model  of 
cognitive  processes,  based  on  observed  programmer  behavior, 
which  aids  in  acquiring  this  understanding.  Further,  the 
study  contends  that  the  effectiveness  of  these  processes  is 
dependent  upon  the  extent  of  the  programmer's  knowledge 
base . 


Most  cognitive  research  analysing  prcgrarrrrer  behavior 
supports  tee  iaea  of  levels  of  skill  or  ability,  and 
categorizes  programmers  as  either  novice,  experienced,  or 
expert.  Based  en  the  proposed  theoretical  rrodel,  this 
ability  is  defined  by  hov  well  the  processes  are  developed 
by  the  programmer,  and  the  extent  of  his  cr  her  knowledge 
base. 

A  novice  has  a  relatively  limited  knowledge  base. 
Consequently,  there  is  very  little  development  of  tbe 
cognitive  processes  in  evidence.  Ee  or  she  is  considered 
primarily  a  learner,  using  mainly  unsophisticated 
techniques,  such  as  inductive  reasoning,  to  gain  an 
understanding  of  a  prograrr. 

An  experienced  programmer  has  a  fairly  extensive 
knowledge  base.  It  includes  information  about  rrcst  of  the 
knowledge  domains  necessary  for  program  understanding.  The 
depth  of  information  in  these  domains  is,  however,  uneven. 
Ey  this  it  is  rreant  that  an  experienced  programmer  may  know 
algorithms  to  perform  a  certain  function,  for  example  to 
sort  nuirbers,  but  may  find  it  difficult  to  adapt  one  of 
these  to  sort  words.  Or,  in  the  category  cf  programming 
languages,  he  or  she  may  be  fairiliar  with  the  syntax  and 
sen-antics,  but  unsure  of  tbe  underlying  design  and  its 
effects  on  a  program. 
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Although  stlli  learning,  the  primary  emphasis  at  this 
stage  of  a  programmer 's  growth  is  the  development  of 
cognitive  processes  which  maue  efficient  use  of  this 
knowledge.  At  this  stage,  the  programmer's  performance  is 
good,  though  inconsistent,  over  a  spectrum  of  less  difficult 
tasKS.  It  does,  however,  degrade  rapidly  as  tasK  difficulty 
increases,  indicative  of  only  partially  developed  processes 
and  the  uneven  knowledge  base. 

An  eipert,  on  the  other  hand,  has  acquired  a  broad 
knowledge  base,  including  many  specifics  abcut  programming 
languages  and  design,  algorithms  and  data  structures,  task 
domains,  etc.,  as  well  as  how  they  relate  to  one  another. 
He  or  she  has  a  consistently  high  level  of  performance  as 
well,  proportional  to  task  difficulty.  This  results  from  a 
demonstrated  use  of  v»ell  developed  cognitive  processes. 

These  processes,  which  mate  use  of  the  knowledge  base, 
in  conjunction  with  external  information  (program  teit, 
documentation,  problem  specifications,  etc.),  enhance  the 
expert's  ability  to  gain  an  in-depth  understanding  of  the 
software  involved  in  a  given  maintenance  task.  It  is  this 
demonstrated  capability  that  distinguishes  the  expert  from 
either  a  novice  or  experienced  programmer. 

Acknowledging  this,  the  choice  for  this  study  is  to 
model  an  expert  involved  in  the  task  of  understanding  an 
unfamiliar  program  in  order  to  perform  some  type  of 
maintenance.    What  these  processes  are,   how  they  are  used, 


and  what  information  is  contained  in  the  Knowledge  base, 
form  the  rajor  portions  of  this  rrodel.  Realizing  the 
subjective  nature  of  the  study,  it  is  not  a  clairr  that  this 
is  a  definitive  model.  It  is,  however,  reasonable  and 
representative  of  programmer  behavior  demonstrated  hy 
experts.  In  fact,  this  study  contends  that  it  is  this  very 
behavior  of  matting  efficient  use  of  These  processes  which 
determines  expertise  in  this  area. 
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II.   MEf CRY  and  RECALL 

We  Know  empirically  that  information  is  rerrerr  tiered — 
stored  Id  the  train--and  can  be  recalled.  Most  evidence 
also  supports  tee  hypothesis  that  hurran  rremcry  is  at  least 
partly  associative  [Hef.  2j  .  Ey  this  it  is  meant  that 
facts,  events,  concepts,  and  other  types  of  inforrration  are 
encoded  and  stored  in  memory  as  separate  elerrerts  or  sets  of 
elerrents,  connected  to  one  another  by  means  of  association. 
Each  elerrent  is  stored  only  once,  but  can  have  any  nurrber  of 
associations  with  other  elerrents.  Each  elerrent  is  also 
directly  accessible.  One  rrethod  of  Knowledge  representation 
which  incorporates  rrany  of  the  concepts  and  properties 
associated  with  this  type  of  rremcry  is  the  serrantic  net. 

As  there  is  no  evidence  that  strongly  supports  any 
theory  yet  proposed  to  explain  how  memory  and  recall  are 
accomplished ,  it  should  he  noted  that  the  model  proposed 
here  uses  semantic  nets  only  as  a  tool.  The  ideas  of 
serrantic  nets  will  aid  in  explaining  certain  cognitive 
processes.  However,  the  model  itself  has  teen  developed 
based  on  research  data  and  its  validity  is  independent  of 
this  or  any  other  theory  regarding  how  these  rudlrrentary 
cerebral  functions,  memory  and  recall,  are  accomplished. 

Merrory  is  corrrronly  thought  of  as  having  two  parts  or 
areas.    These   are  labeled  long  Term  Memory  and  Short   Term 
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Perrory.  Tfcis  rray  not  be  a  physical  division,  though  some 
researchers  suggest  that  they're  located  in  different  areas 
of  the  train,  tut  rather  one  of  cognition.  Some  researchers 
also  include  a  third  area,  Working  Memory.  As  the  validity 
of  this  additional  division  of  memory  is  not  critical  to  the 
model,  the  simpler  idea  is  adopted.  A  final  form  of 
'memory',  called  External  Memory,  is  also  used. 

A.   SEMANTIC  MTS 

A  serrantic  net  is  a  directed  graph  made  up  of  nodes, 
representing  otjects,  connected  to  one  another  via  links. 
These  links  indicate  specific  relationships  or  associations 
tetveen  nodes.  This  representation  of  knowledge  is  very 
popular  among  members  of  the  Artificial  Intelligence 
corrmunity.  As  there  is  no  definitive  set  of  characteristics 
for  a  semantic  net,  these  relevant  tc  the  model  proposed 
here  are  descrited.  Much  of  this  information  is  taken  from 
a  teit  ty  WINSTON  [Ref.  3],  whose  description  seems  standard 
when  compared  to  others  in  the  literature.  Properties  have 
teen  added  or  altered,  however,  to  aid  in  explaining  certain 
behaviors  of  expert  programmers.  It  is  emphasized  again 
that  the  irodel  is  tased  on  otserved  tehavior,  and  in  no  way 
depends  en  the  validity  cf  this  presentation  of  semantic 
nets,  or  any  other  knowledge  representation. 

Three  terms  ere  used  here  to  descrite  semantic  nets. 
The   otjects  of  the  net  are  called  nodes  and   the   relations 
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between  cbjects  are  called  linKs.  They  are  represented  in 
the  figures  by  labeled  circles  end  arrows  respectively.  A 
third  terrr  used  ty  WINSTON,  which  is  less  standard,  is  the 
slct.  The  slots  of  e  ccie  ere  the  aifferert  nered  linKs 
originating  at  the  node.  An  exarrple  rrigbt  serve  here  to 
better  describe  the  use  cf  these  terrs. 

In  Figure  1,  we  have  an  eiarrple  cf  a  semantic  net.  The 
five  objects  are  CAR27  which  is  a  specific  car,  CAR  which  is 
a  general  acstrection,  TOUG  and  JILL  which  represent 
specific  pecple,  and  the  otject  BIUI.  There  is  an  CVNID-BY 
lirit  between  CAF27  and  DCL'G ,  and  cetween  CAR27  end  JILL. 
There  is  an  IS-A  lint  between  CAR27  and  CAR,  and  there  is  a 
CCICR  linK  between  CAR27  and  ELUI.  CAR27  has  four  links 
associated  with  it,  tut  only  three  slots.   The  CCICR  slct  is 


figure  1  -  A  slirple  semantic  net 
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filled  with  the  value  BLUE,  the  IS-A  slot  with  the  value 
CAB,  and  the  CWMD-BT  slot  with  the  values  DCUG  and  JUL. 
Note  tnat  the  objects  do  net  have  to  te  tangible  lterrs,  as 
illustrated  by  the  object  BLUE.  ligvre  1  is,  of  course,  a 
representation  cf  the  knowledge  that  CAB2?  is  a  blue  car 
owned  by  long  and  Jill. 


iigure  2  -  Inheritance  in  Semantic  Nets 
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When  CAR2?  is  thought  of,  rrany  facts  about  it  corre  to 
uiDct.  It  has  an  engine,  tires,  and  seats.  Also,  it  is  a 
vehicle  used  for  transportation.  loes  this  rreen  that,  using 
cur  representation,  the  object  CAR27  should  have  direct 
lirks  to  the  otjects  ENGINE,  TIRE,  SEAT,  VEHICIE,  ana 
TRANSPORTATION?  The  answer  is  no.  The  way  this  inforrratlon 
is  represented  is  through  a  property  called  inheritance,  ana 
the  use  ot  frarres. 

Inheritance  is  an  object's  acquisition  cf  a  slot  value 
by  inheriting  the  value  frorr  another  object  through 
association.  Eigure  2  is  a  sen-antic  net  showing  one 
representation  of  the  afccve  facts  about  CAR27.  As  can  be 
seen,  CAR27  has  no  USED-FCR  link,  but  does  have  an  IS-A  link 
to  the  rore  abstract  object,  CAR.  However,  it  also  has  no 
USED-FOR  link,  but  is  associated  to  the  cbject  VEBICLE 
through  an  AKO  -  A  Kind  Cf  -  link.  In  tracing  the  net  frorr 
CAR27,  VERICLE  is  the  first  node  reached  which  dees  have  a 
USED-FCR  slot  value,  TRANSPORTATION.  CAR27 ,  therefore, 
inherits  this  value  through  its  indirect  association  with 
VEBICLE. 

Again  looking  at  Figure  2,  notice  the  object  CAR  is 
lirked  to  sorre  familiar  characteristics  of  a  car  via  RAS 
links.  This  area  of  the  net,  isolated  in  Figure  3,  is 
called  a  iRAr:E.  A  frarre  is  a  set  or  cluster  of  objects 
which  serve  as  slot  values  for  an  abstract  or  less  specific 
cbject.    Its   purpose  is  to  group  properties  ccmrrcn  to  many 
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specific  objects-  which  are  instances  of  the  abstraction. 
These  properties  cr  slot  values  are  then  inherited  by  the 
rrore  specific  instances,  naming    the  net  less  con-plicated . 

Slots  can  te  added  to  or,  although  less-  likely, 
suttracted  frcn  a  frarre.  This  would  occur  due  to  additional 
infcrrraticn  being  incorporated  into  the  net.  Because  of  the 
dynamics  of  frames,  they  always  represent  the  rrost  current 
abstraction  relative  to  the  entire  semantic  net. 


iigure  3  -  A  Irarre 

A   fraire   also   serves  to   provide   DEFAUIT   values  for 
incomplete  pictures.    Let's  say,  for  illustrative  purposes, 

that   one  of  the  slots  of  the  frame  representing  CAR  is  the 

CCICR   slot,   and   it   is  filled  win  the   value   RED.  Now 

further   suppose   another  object  CAR2S   is   introduced,  hut 

without   a   COICP   Unit.    Since  all   cars   rust   have  some 
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specific  color,  CAR28  is  incomplete.  To  remedy  this,  it 
inherits  the  default  value  RED,  until  such  tirre  as  its  own 
color  is  added  to  the  knowledge  base. 

Exceptions  and  unusual  circumstances  rrust  also  be 
accounted  for.  Using  the  CAP  exarrpie  again,  suppose  CAF26 
is  an  experimental  model  using  compressed  air  fcr  power. 
The  PROPULSION  slot  of  the  CAR  frame  is  filled  with  the 
value  ENGINE,  yet  for  CAR28,  this  would  be  incorrect.  Prior 
to  knowing  the  method  of  PROPULSION,  it  is  'assumed'  that 
CAR26  is  powered  fey  en  engine.  Once  the  method  is  known, 
however,  a  PRCFULSION  link  is  added  to  CAR28,  reflecting  the 
exception.  Now,  in  trying  to  fill  the  FRCFULSICN  slot  for 
CAR28,  tfce  iirst  value  arrived  at  is  COMPRESSED-AIR,  the 
search  stops,  end  the  frame  slot  value  becomes 
inconsequential.   Figure  4  is  the  representative  net. 

By  this  explanation,  it  may  appear  that  all  objects 
making  up  a  frame  are  default  values,  and  exceptions  nothing 
more  than  specific  slot  values  in  lieu  cf  the  default. 
Each,  however,  is  subtly  different.  A  frame  is  made  up  cf 
attributes  of  an  object.  Some,  such  as  engine,  tire,  or 
seat,  are  common  to  the  majority  and  as  such  are  not 
substitute  values,  used  for  lack  of  one  more  specific,  but 
the  same  value  shared  among  many  objects.  An  exception  is 
where  particulars  cf  an  object  contradict  any  cf  these 
shared  values.  Cthers,  such  as  color,  are  common  attributes 
with  pcssibly  different  values  for  each  instance  cf  the  item 
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whose  a&stracTion  is  represented.  These  ere  truly  default 
values,  whose  purpose  is  to  fill  a  void  until  rrore  specific 
information  is  ottained.  This  information  is  not  an 
exception  to  the  frarre,  but  an  eipected  piece  of  data 
previously  missing  or  unknown. 


figure  4  -  Semantic  Net  with  Exception 
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Another  quality  cf  an  associative  memory  Is  the  ability 
to  distinguish  the  correct  usage  of  an  ctject,  through 
conteit  or  perspective,  when  rrany  different  rreanlngs  eiist. 
This  dependency  on  conteit  rjust  also  be  represented  in  the 
net.  Work  cited  ty  COHEN  supports  the  idea  that  objects 
each  have  rrany  classifications,  deterrrined  by  conteit  [Ref. 
<*:  pp.  S-ie],  This  is  because  certain  objects,  when  viewed 
frorr,  different  perspectives,  take  on  new  or  different 
qualities  end  attributes.  A  car,  for  eiarrple,  can  be  looked 
at  as  an  autorrotile,  or  as  a  tcy,  or  as  the  car  cf  a  train. 
Obviously,  each  will  have  different  attributes  which  are 
identified  through  conteit.  The  result  is  one  object  with 
three  distinct  purposes  or  aspects. 


PROPULSION 


Mgure  5  -  A  Perspective  Node  Eundle 
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One  way  tc  represent  this  in  a  semantic  net  is  to  view 
an  object  as  a  node  bundle.  Tbis  bundle  consists  of  a 
general  object  ncde  as  well  as  a  number  of  nodes  each 
representing  a  different  perspective  for  that  object.  links 
relevant  to  a  particular  context  are  associated  with  the 
corresponding  perspective  ncde. 

Witb  such  a  representation,  shown  for  CAR  in  Figure  5, 
slot  values  are  accessed  either  with  or  without  a 
perspective.  Say,  for  example,  the  size  of  CAR  is  needed. 
If  CAR  is  with  reference  tc  a  train  the  returned  value  would 
be  quite  a  bit  different  than  if  the  inquiry  were  irade  for  a 
toy  car.  If  no  perspective  is  given,  the  node  bundle 
collapses  to  tbe  single  CA?  node  used  throughout  this 
eiarrple.  This  causes  all  possible  slot  values  to  be 
returned,  eacb  annotated  witb  the  associated  perspective. 

This  notion  of  node  bundles  and  object  classification 
leads  to  tbe  idea  of  node  clustering.  Put  sirrpiy,  a  node 
cluster  is  a  grouping  in  the  net  of  objects  and  links 
strongly  associated  with  one  or  two  specific  objects  of  the 
cluster.  MINSK!  uses  a  geographic  analogy  tc  illustrate  the 
idea  [Ret*.  5:  pg .  118].  He  suggests  picturing  capitol 
cities  witb  streets  rowed  by  bouses.  These  cities  are 
connected  via  rrajor  throughfares  to  smaller  suburban  cities, 
which  are  in  turn  connected  to  towns,  etc.  Tbe  analogy  to 
clusters,  objects,  and  links  is  readily  apparent. 


20 


The  implication  of  this  analogy  is  teat  semantic  nets 
are  organized  bierarchicelly .  If  tnis  idea  is  accepted,  it 
fellows  that  in  order  to  recall  a  certain  piece  of 
information,  several  levels  of  tre  hierarchical  structure 
must  be  transited  depending  on  tre  point  of  entry.  This 
walK  through  several  levels  necessarily  has  an  adverse 
effect  en  the  speed  of  recall.  Yet,  in  seme  Instances, 
information  which  should  be  separated  by  several  levels  is 
recalled  faster  than  expected,  implying  an  alternative 
method.  To  explain  this,  MINSKY  introduces  a  second  notion 
which  allows  for  shortcuts  through  several  levels.  The 
argument  is  that  if  a  certain  path  is  reinforced  a  number  of 
times  through  use,  a  direct  link  is  formed,  analogous  to 
taking  back  roads  to  avoid  lights  and  traffic. 

These  properties  of  semantic  nets  reflect  these  of  an 
associative  memory  and  will  be  referred  to  extensively 
throughout  the  remainder  of  this  paper.  retails  will  be 
added  as  necessary,  to  further  explain  behaviors,  and  this 
should  make  these  semantic  net  properties  clearer.  However, 
it  is  important  for  the  reader  to  understand  these  before 
proceeding . 
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E.   SHORT  TIRM  MEMORY 

Information  enters  the  cognitive  system  through  short 
terrr  rrerrory.  CURTIS  [Ref.  6]  quite  adequately  describes 
this  rrerrcry  as  : 

"e   lirrited   capacity  workspace  which  holds  and   processes 
those  iters  of  information  currently  under  our  attention." 

This  lirrited  capacity  was  first  quantified  by  MIIIER  as  7+2 
items  [Ref.  7] .  As  will  be  seen  later,  an  iterr  is  not 
lirrited  to  a  single  rrerrory  element,  and  may  be  a  'chunK'  of 
indefinite  size. 

The  information  which  exists  in  short  term  memory  is 
transient  and  must  be  constantly  used  or  'rehearsed'  to 
prevent  its  rapid  decay  [Ref.  8].  If  the  information  is 
gained  ?ia  perception,  this  rehearsal  will,  after  a  time, 
fix  the  information  in  long  term  memcry.  This  is  sometimes 
called  the  learning  process.  If,  on  the  other  band,  the 
information  being  used  was  recalled  from  lcng  term  memory, 
this  rehearsal  serves  to  reinforce  it.  This  reinforcement 
has  a  positive  effect  on  the  future  recall  of  this 
information  and  may  cause  it  to  migrate  due  to  repetitive 
use.  Both  rapidity  of  recall  and  information  rrigration  are 
discussed  latei  as  they  pertain  to  the  model. 
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C.   LONG  TERM  MEMORY 

When  we  learn  or  terrorize  something,  the  information  is 
retained  in  long  term  memory.  Wben  some  event  causes  the 
recall  of  other  eveots  in  the  mind,  the  information  comes 
from  long  term  memory.  It  is  the  reservoir  of  permanent 
knowledge  used  in  cognition,  and  has  stored  in  it  everything 
from  the  spatial  rrodel  of  the  world  to  the  notor  and 
perceptual  skills  used  moment  to  moment  [Ref.  y:  pg.  56]. 
Fut  simply,  it  is  the  knowledge  base  we  operate  from. 

Unlike  short  term  memory,  the  capacity  of  long  term 
nerory  seems  virtually  unlimited.  It  receives  and  stores 
new  information  after  processing  in  short  term  memory,  and 
this  information  is  directly  accessible,  once  stored.  Also, 
research  has  shewn  that  the  knowledge  in  long  term  memory  is 
organized,  and  that  the  organization  may  change  almost 
instantaneously,  based  en  the  context  of  the  information 
being  processed  in  short  term  memory.  As  will  be  seen 
later,  this  ability  Is  significant  in  terms  cf  the  model, 
and  will  be  discussed  in  more  detail  as  it  relates  to  an 
expert  prcgranmer 's  knowledge  base. 

E.   EXTERNAL  MEMORY 

As  an  aid  to  information  processing,  external  devices 
such  as  pencil  and  paper,  chalkboards,  and  tape  recorders 
are  used  to  store  information  not  in  long  term  memory  which 
the  programmer  wants  readily  available  fcr  reference.    This 
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helps  to  compensate  for  the  limited  capacity  of  short  term 
remory,  ana  complements  long  term  memory.  All  methods  used 
for  this  purpose  are  generally  referred  to  as  eiternal 
rrerrories . 
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III.   KNOVLEEGI  EASE 


"Experts  and  novices  differ  in  their  abilities  to  process 
large  amovrts  of  meaningful  inf  orrration .  . .  .A  corrrron 
explanation  of  this  difference  is  tbat  eiperts  have  not 
only  more  inf  orrratior ,  they  have  the  information  better 
organized. . .raking  their  perception ^more  efficient  and 
their  recall  performance  much  higher."   [fief,  le] 


The  above  qtote  emphasizes  the  importance  of  both  the 
contents  and  the  organization  of  the  knowledge  base. 
Included  in  the  discussion  presented  here  is  the  conviction 
that  the  contents  of  rremory  somehow  affect  this 
organization.  Also,  based  on  date  from  several  studies 
referenced,  this  organization  is  dynamic  and  dependent  on 
context . 

A.   CONTENTS 

Along  witt  tasic  knowledge,  normally  acquired  through 
grade  school  and  college,  the  expert  prograrrrrer  knows  a 
great  deal  about  five  major  categories  of  knowledge 
associated  with  programming.   These  are: 

-  ALGOB ITEMS 

-  PROGRAMMING  LANGUAGES 

-  LOGIC 

-  DATA  STRUCTURES 

-  PROGRAMMING  IESIGN  METHODOLOGIES 
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The  depth  of  knowledge  in  these  categories  allows  the  expert 
to  quickly  focus  on  the  important  aspects  of  new 
information.  Using  the  processes  covered  in  the  next 
chapter,  be  or  she  can  then  encode  this  information  and 
relate  it  to  what  is  already  in  long  term  memory. 

Experts  are  familiar  with  many  algorithms  whict  do 
essentially  the  same  jet).  Associated  with  each  in  the 
knowledge  base  is  a  set  of  benefits,  drawbacks, 
applications,  and,  either  implicitly  or  explicitly,  a 
conplexity  evaluation.  Choosing  integer  sorting  as  a 
representative  task,  there  are  several  options:  Merge  Sort, 
Comparison  Sort,  Kadix  Sort,  and  Quick  Sort  to  name  a  few. 
Each  is  useful  in  accomplishing  the  sort,  however,  each  is 
also  especially  suited  to  certain  applications.  Each  also 
has  variations  which  are  applicable  to  other  types  of  sorts. 
The  expert  is  familiar  with  these,  as  well  as  the  underlying 
principles  which  differentiate  teem  from  one  another.  This 
allows  him  or  ber  to  readily  adapt  these  algorithms  to  meet 
different  needs,  lexicographic  sorting  for  instance. 

like  algorithms,  data  structures  have  many  variations. 
The  expert  is  familiar  witb  these  and  with  the  underlying 
principles  behind  their  design  as  well.  This  allows  easy 
modification  to  meet  new  requirements  and  aids  the  expert  in 
recognizing  design  flaws  such  as  lack  of  flexibility  or 
expandability .  The  expert  also  has  knowledge  of  algorithms 
and   can  correlate  a  given  data  structure  with  an   algorithm 
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or  group  of  algorithms  1'or  a  specific  application.  The 
expert  can  alsc  relate  information  ce  programming  languages 
to  data  structures,  evaluating  the  relative  ease  with  which 
specific  structures  can  be  used  and  manipulated. 

Programming  languages  are  ,  to  some  degree,  familiar  to 
all  programmers,  whatever  their  skill  level.  An  eipert, 
however,  is  not  only  versed  in  ihe  syntax  and  semantics  of 
several  languages.  He  or  she  is  also  familiar  with  the 
advantages  and  disadvantages  of  one  language  design,  or 
particular  machine  implementation,  over  another.  While  the 
choice  of  language  is  net  an  option  for  the  programmer 
tasked  with  maintaining  or  debugging,  the  particular  design 
and  implementation  features  play  an  important  role  when 
porting  software  from  one  machine  to  another. 

Knowledge  of  language  design  and  implementation  alsc 
allows  the  eipert  to  make  judgements  acout  software 
efficiency  and  memory  needs.  This  knowledge  alsc  allows  for 
identifying  potential  trouble  spots,  usually  avoiding 
analysis  of  the  entire  program.  This  is  particularly 
important  when  evaluating  possible  effects  of  a 
modification . 

Information  atout  algorithms  also  contributes  to  the 
knowledge  of  languages.  As  most  languages  have  built-in 
functions,  the  expert  can  evaluate  the  particular  algorithms 
used  to  implement  these.  This  evaluation  adds  to  his  or  her 
knowledge  base  of  programming  languages,   aids  in  efficiency 
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analyses,  and  is  useful  in  predicting  the  accuracy  of 
results.  Supported  by  this  knowledge,  an  expert  rray  choose 
to  substitute  ether  routines  using  more  applicable 
algorithms,  tor  such  things  as  increased  accuracy  in 
calculations,  rrcre  efficient  device  drivers,  or  faster 
access  to  secondary  storage.  He  or  sbe  might  also  choose  to 
replace  programmed  functions  with  ones  tuilt  into  the 
lerguage,  for  the  same  reasons. 

Knowledge  regarding  logic  is  important  in  two  ways. 
first,  it  enables  the  expert  to  learn  the  specific 
implementation  of  control  statements  in  a  programming 
language,  adding  this  to  his  or  her  Knowledge  base.  Second, 
it  aids  in  evaluating  the  flow  of  control  in  a  given  piece 
of  software.  Beth  help  in  analysing  the  efficiency  of  the 
software.   Taking  the  following  IF-THEN  statement: 

II  (  A  >  10  )  OB  (  B  <  15  )  THEN  C  =  D 
the  expert  would  know,  or  could  test,  whether  or  not  the 
second  comparison  is  executed  independent  of  the  result  of 
the  first.  laking  advantage  of  this  type  of  information 
could  greatly  impact  the  software's  efficiency,  saving  money 
and  CPU  time. 

Programming  design  methodologies  are  treated  differently 
from  other  categories  in  the  knowledge  base.  They  can  not 
be  defined  in  specific  terms,  as  we  have  done  with  the 
others,  and  are  seen  as  more  of  a  gestalt  type  of  knowledge. 
They  help   the  expert  in  analysing  possible   side   effects, 
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which  is,  in  part,  a  function  of  modulari  ty  .  They  play  a 
major  role  in  processes  to  be  presented  later,  such  as 
CHUNKING,  SLICING,  and  HYKTHIS IZING . 

Aside  from  knowledge  of  programming,  the  expert 
rralntainer  must  also  know  something  of  the  specific 
application  area.  The  level  or  amount  of  information 
recessary  is  dependent  upon  the  modification  to  be 
implemented .  At  the  very  least,  however,  the  programmer 
needs  to  know  enough  to  he  ahle  to  interpret  the 
dccumentat ion  and  program  specifications  in  order  to  make  a 
Judgement  regarding  potential  side  effects  of  the  change. 
This  information  is  either  learned  information  in  long  term 
memory,  which  can  be  recalled  for  future  tasks,  transient 
information  used  and  then  forgotten,  or  information  kept  as 
reference  using  an  external  memory. 

The  view  of  this  study  is  that  what  is  contained  in  the 
knowledge  base  directly  affects  the  programmer's  ability  to 
understand  a  given  piece  of  software.  Otviously,  what  the 
programmer  knows  at  the  cutset  about  the  program's  task 
domain,  and  information  related  to  it,  will  impact  on  his  or 
her  difficulty  in  gaining  this  understanding.  Extending 
this  idea,  a  large  disparity  in  the  knowledge  level 
significantly  affects  the  level  of  competence  of  the 
programmer  and,  consequently,  the  relative  quality  of  the 
work . 
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The  cognitive  processes  which  interact  with  this 
knowledge  base,  in  order  for  the  programmer  tc  achieve  this 
understanding,  perform  essentially  three  functions.  Factual 
information  is  analysed  ard  added  to  the  Knowledge  base,  or 
concepts  and  rrethodologies  are  atstracted  from 
documentation,  or  information  from  one  category  is 
associated  with  that  frorr  another  (such  as  correlating  a 
data  structure  with  an  algorithm).  These  functions  serve  to 
integrate  all  information  available  to  the  programmer 
applicable  to  the  task. 

This  Knowledge  tase  is  not  simply  a  collection  of  facts. 
It  is  the  organized  accurru  lation  of  information  into  a 
network  reflecting  semantic  associations.  This  organization 
is  equally  as  important  as  the  information  itself. 

B.   ORGANIZATION 

Studies  of  recall  show  that  people  tend  to  organize 
information  into  categories  and  groupings.  Most  items  or 
objects  in  memory  are  members  of  more  than  one  of  these 
categories,  dependent  on  context.  A  piano  is  a  rember  of 
the  musical  instrument  category,  and  can  be  sub-categorized, 
as  a  keyboard  instrument  in  the  context  of  musical 
instruments.  It  is  also  a  rrerrber  of  the  category  which 
includes  hutcr.  and  dresser  when  viewed  as  a  heavy  piece  of 
furniture. 
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Grouping  ty  order  is  another  otserved  way  memory  has 
teen  organized.  A  person  asked  to  list  the  ingredients  of  a 
recipe,  for  example,  will  rrore  than  liKely  list  them  in 
order  of  their  use.  When  asked  to  list  iters  necessary  to 
equip  a  home,  housewives  listed  these  lterrs  either  by 
category  —  kitchen  utensils,  furniture,  window  coverings  —  or 
by  considering  necessary  iteirs  roorr  fcy  roorr   [Ref  .  4:  pp.  8- 

11]. 

The  evidence  provided  by  these  studies  support  the 
hypothesis  that  memory  is  organized  dynamically,  based  on 
the  context  of  the  stimulus.  It  also  implies  that  this 
organization  makes  use  of  information  clustering.  What  is 
meant  here  is  that  information  elements  related  ty  conteit 
'migrate'  toward  certain  key  elements  or  toward  one  another. 
In  either  case,  this  clustering  strengthens  associations  in 
context  between  these  information  elements,  enhancing 
recall.  As  explained  in  a  later  chapter,  this  enhancement 
aids  cognition  ty  raking  pertinent  information  readily 
available  to  short  term  memory,  while  'blocking'  irrelevant 
associations  involving  these  same  elements. 

Pecause  these  groupings  are  determined  by  context,  the 
amount  of  information  contained  in  the  knowledge  base 
associated  with  each  element  has  a  tearing  on  their 
categorization.  The  greater  the  amount  of  associated 
knowledge,  the  mere  refined  the  groupings  can  be.  As  more 
knowledge   is   gained  and   this   refinement   continues,   new 
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clusters  are  forrred  to  replace  those  less  refined,  and  the 
association  between  any  two  becorres  rrore  specific.  This,  in 
turn,  results  in  a  reorganization  of  rrerrory. 

The  studies  cited  here  involve  sirrple  element  lists. 
However,  tnis  idea  is  easily  extended  to  rrore  complex 
information  elements,  such  as  concepts,  ideas,  and 
abstract  ions,  which  are  therrselwes  clusters  of  information. 
The  implication  throughout  this  chapter  is  that  different 
knowledge  categories  or  domains  are  used  best  when 
integrated.  Eow  the  contents  and  organization  of  memory 
relates  specifically  to  the  expert,  and  how  this  integration 
is  accomplished,  is  addressed  in  tre  tollowing  chapter. 
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IV .   THE  PROCESSES 

SCHNSIEERMjW   ana  MAYER  conjecture  that,   to   facilitate 
program  comprehension: 


"the  programmer ,   with   the  aia  of  his  or  her  syntactic 
Knowledge  of   the  language,   constructs  a  mult ileyeled 
internal  semantic  structure  to  represent  the  program." 
[Bef.  11] 


The  present  study  has  identified,  in  the  context  of  software 
maintenance,  three  major  complementary  cognitive  processes, 
supported  by  certain  lesser  ones,  used  to  accomplish  this. 
Further,  it  is  the  tenet  of  the  study  that  the  entire 
program  need  net  te  represented  in  memory,  but  only  that 
part  which  is  of  interest  as  determined  by  the  programmer. 

The  descriptions  of  these  processes  have  been  formulated 
from  observed  programmer  behavior.  The  ideas  presented  are 
extensions  of  theories  based  on  empirical  data  resulting 
from  limited  testing.  Introduction  ana  subsequent  treatment 
cf  these  ideas  in  the  literature  has  been,  in  many  cases, 
artfully  vague,  with  researchers  characteristically  relying 
on  intuitive  understanding  through  example.  Therefore, 
although  an  attempt  is  made  here  to  more  clearly  define 
these  processes,  the  next  chapter  presents  a  scenario 
exemplifying  the  application  of  each. 
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A.      CHUNKING 

The  cognitive  process  Known  as  'chunking'  is  a  learned 
skill,  enabling  a  programmer  to  encode  information  in  such  a 
way  that  a  group  of  information  elements  can  te  represented 
aDd  processed  as  a  single  element  in  short  term  memory 
[Eef.  7J  .  As  mentioned  previously,  short  terrr  rreirory  is 
where  information  processing  occurs,  and  is  characterized  as 
raving  a  limited  capacity.  This  grouping  or  organizing  of 
information  allows  programmers  to  operate  en  'chunks'  of 
associated  information  rather  than  single  items.  This 
translates  to  giving  the  programmer  a  troader  perspective  of 
the  task. 

Chunking  is  a  very  dynamic  process,  in  terms  of  the 
knowledge  base.  A  chunk  is  created  when  an  association  is 
formed  between  an  encoded  item  in  short  term  memory  and  its 
corresponding  information  cluster  in  long  term  memory.  This 
cluster  is  the  result  of  a  reorganization  of  memory  based  on 
the  context  cf  the  stimulus  which  initiated  the  chunking 
process.  It  can  fce  added  to  or  deleted  from,  based  on  the 
results  of  partial  completion  of  the  task  for  which  it  was 
created,  or  as  information  is  learned,  regarding  the  task, 
thrcugh  other  processes. 

Chunking  associations  may  also  be  formed  between  the 
encoded  item  and  information  in  external  memories.  These 
associations  may  access  information  directly,  or  might 
simply   guide   the  programmer  to  a  reference   in   which   the 


necessary  information  is  contained.  In  either  case,  they 
allow  the  programmer  the  use  of  transient  or  task  specific 
information.  At  the  same  time,  they  alleviate  the 
programmer  of  the  burden  of  having  to  learn  the  information 
so  it  might  be  added  to  the  cluster,  or  of  having  to  store 
it  in  short  terrr  memory  tefore  it  is  needed. 

The  amount  of  information  represented  by  a  chunk  is 
artltrary  IRei.  12j .  Its  size  is  dependent  on  how  much 
associated  information  is  contained  in  the  krowledge  base, 
and  to  what  eitent  external  memories  are  used.  The  results 
of  research  ay  MIILER  and  others  indicate  that  the  number  of 
items  used  or  stored  in  short  tertT  memory  is  relatively 
constant.  From  this  it  can  be  concluded  that  the  number  of 
chunks  which  can  be  processed  is  independent  of  chunk  size 
IRef.  13:  pg.  177,  Ref.  9:  pg.  44] .  Thus,  chunking 
effectively  increases  the  capacity  of  short  term  memory  as 
relates  to  information  processing. 

Besides  having  the  ability  to  handle  more  information  in 
short  term  memory,  chunking  also  allows  the  programmer  quick 
access  to  specific  information  which  is  part  of  the  chunk. 
The  reason  is  that  chunks,  representing  information 
clusters,  enhance  recall  of  that  information.  All  knowledge 
associated  witt  the  chunk  has  effectively  been  accessed,  ana 
can  be  thought  of  as  staged  for  recall.  This  can  best  be 
explained  by  using  a  semantic  net  representation. 
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When  tee  chunk  is  created,  a  reorganization  of  the 
Knowledge  base  takes  place,  and  inforrraticn  migrates, 
forming  a  high  density  node  cluster.  Again,  the  size  of 
this  cluster  depends  on  the  extent  of  the  knowledge  base. 
This  density  decreases  the  length  of  nodal  links,  resulting 
in  a  shorter  walk  from  the  initial  access  node  or  capital  of 
the  cluster  to  the  desired  information  element.  The 
asscciaticn  between  the  encoded  item  and  the  knowledge  base 
is  one  example  of  the  'shortcut'  described  earlier,  and 
licks  short  term  memory  to  the  capital  of  the  cluster. 

The  perspective  has  also  been  Identified  and 
associations  tetween  codes  not  in  context  have  teen 
deerrphasized.  All  the  information  represented  by  the  chunk 
is  new  just  beyond  the  programmer's  consciousness  waiting  to 
be  recalled.  The  encoded  item  can  therefore  be  processed, 
representing  a  group  of  knowledge,  with  specific  items 
associated  with  the  chunk  rapidly  recalled  for  use  when 
necessary . 

Some  researchers,  such  as  KINTSCE,  suggest  that  chunks, 
once  formed,  can  be  permanently  stored  in  long  term  memory 
[Ref.  12:  pg.  175].  This  idea  is  inconsistent  with  the 
presentation  here,  and  research  for  this  study  has  uncovered 
no  data  to  support  the  hypothesis.  KINTSCH  himself 
differentiates  tetween  what  a  chunk  is  in  short  and  long 
term  memory.  His  idea  of  stored  chunks  closely  corresponds 
to  the  earlier  presentation  of  information   clustering.    As 
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it  is  the  contention  of  tnis  study  that  a  chunk  exists  oniy 
sc  long  as  it  is  under  the  programmer  's  attention,  this 
notion  of  permanently  stored  chunks  is  disregarded. 


B.   SIICING 

Expert  programmers  break  large  unfamiliar  programs  into 
smaller  coherent  pieces  in  crder  to  gain  an  understanding  of 
their  function  and/or  design.  Often,  these  pieces  are 
determined  by  the  original  writers  of  the  cede.  They  are 
identified  as  clocks  of  code  in  the  form  of  subroutines, 
procedures,  functions,  and  the  like.  Identification  is 
usually  explicit  and  the  pieces  are  written  into  the  source 
as  contiguous  lines  of  program  text.  Cne  can  think  of  these 
as  functional  pieces  of  the  program. 

Also,  experts  routinely  partition  programs  in  ways  that 
dc  net  conform  to  textual,  modular,  cr  functional  structure, 
permitting  multiple  views  of  tte  same  code.  Unlike 
functional  pieces,  which  have  a  one-tc-one  correspondence 
between  function  and  purpose  of  coce  lines,  this  type  of 
division  allocs  lines  of  cede  to  be  viewed  from  different 
perspectives.  This  associates  a  single  lire  of  code  with 
more  than  one  purpose.  The  construction  cf  these  views  is 
what  WIISIR,  who  first  proposed  the  idea,  cells  'Program 
Slicing'.  The  process  is  used  to  strip  from  a  program 
statements  which  do  not  influence  a  specific  behavior  or 
slicing  criterion.   The  result  is  an  abstract  representation 


cf  the  program  as  viewed  from  the  perspective  of  the 
specific  behavicr.  This  group  of  statements,  usually 
associated  with  a  single  variable,  is  called  a  program 
slice  [Ref.  14:  pg .  43y ,  Ref.  It:   pg.  446]. 

Slicing  is  important  in  rraintenance  because  typically 
cnly  a  subset  of  the  program's  behaviors  is  being  improved 
or  replaced.  Ey  eliminating  non-influential  code,  the 
maintainer's  jet  is  made  simpler.  He  or  she  can  then  deal 
with  a  much  smaller  'program'.  While  this  program  may  not 
be  syntactically  correct,  it  is  semantically  correct  for  the 
behavior  of  interest. 

Also,  the  entire  piece  of  software  neea  not  be  sliced. 
If  a  point  in  the  flow  of  control  can  be  identified  which 
bounds  the  slicing  criterion,  then  only  that  part  of  the 
code  still  to  be  executed  need  be  sliced.  This  further 
reduces  the  programmer's  tasK. 

Two  key  areas  of  the  knowledge  base  are  especially 
influential  in  determining  the  effectiveness  of  a 
jrcgrammer's  slicing  ability.  Programming  logic  allows  the 
rTaintainer  to  easily  identify  bounds  of  a  specific  behavior. 
He  or  she  can,  with  an  extensive  knowledge  base,  trace 
through  the  program's  flew  of  control  easily  and  accurately, 
recognizing  particular  logic  features  of  the  language. 
Also,  the  expert's  in-aepth  Knowledge  of  the  programming 
language  gives  him  or  her  the  ability  to  readily  identify 
lires   of   code   which  impact  the   slicing   criterion.    "For 
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example,  familiarity  with  hew  data  is  passed  and  whether  cr 
net  it  is  altered  by  code  or  sirrply  used  ana  returned 
without  change  (ie.  Fass  by  Reference,  Value,  or  Nare)  could 
greatly  affect  the  size  cf  the  slice. 

The  extent  to  which  experts  enplcy  slicing  seers  to 
depend  on  the  program.  Testing  by  W1ISIF  shows  that  factors 
influencing  the  use  cf  slicing  are  cede  size,  structure,  and 
ease  of  understanding  [Ref.  15:  pp.  45S-461] .  This  suggests 
that  slicing  is  found  cy  experts  tc  be  most  effective  on 
pocrly  structured  programs,  and  less  so  or.  those  which  are 
well  designed  and  make  use  of  modules,  corrrrents,  am 
mnemonics,  lit ecti veness  here  is  a  relative  measure  cf  the 
amount  of  worK  eliminated  and/or  information  gained  by 
slicing. 

The  work  by  WEISER  also  demonstrates  that  expert 
programmers  independently  develop  their  cwn  style  cf 
slicing.  This  does  not  preclude  teaching  its  principles  tc 
less  able  programmers,  but  points  out  the  process' 
dependence  on  the  knowledge  and  experience  cf  the 
individual.  It  also  points  to  the  fact  that  it  Is  a 
subjective  process  and  cannot  presently  be  implemented 
fully.  Jor  the  interested  reader,  however,  WEISIB  does 
describe  algorithms  for  approximating  slices  and  discusses 
the  effectiveness  of  two  automatic  slicing  tcols   [Ref.  14]. 
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C.   BYPOTBISIS  FRCCISS 

The  third,  and  perhaps  rrost  powerful,  process  used  by 
experts  is  hypothesis  generation,  refinement,  and 
verification.  It  is  a  top-down  process  which  allows  for 
maximum  utilization  of  the  programmer's  Knowledge  base,  the 
overall  depth  of  which  determines  the  effectiveness  of  the 
process.  It  involves  the  generation,  based  on  information 
in  the  knowledge  base,  and  subsequent  refinement  and 
verification  of  hypotheses  regarding  the  programmer's 
suppositions  about  how  the  code  was  designed  and  written. 
As  more  ana  more  information  about  the  software  is 
processed,  a  hierarchy  of  these  hypotheses  is  constructed. 

This  hierarchy  is  built  quasi  depth-first.  This  is 
because  a  programmer  has  a  tendency  to  focus  on  one  area, 
forming  a  cascade  of  refinement  hypotheses  through  several 
levels  before  shifting  his  cr  her  attention.  The  programmer 
does,  however,  remain  cognizant  of  tre  other  areas. 
Therefore,  information  encountered  while  refining  the 
current  area  of  interest  is  often  used  to  form  hypotheses 
relating  to  these  ether  areas  as  well. 

The  hierarchical  structure  can  be  thought  of  as  defining 
levels  of  understanding.  The  greater  the  depth,  the  m-cre 
the  programmer  has  refined  his  or  her  understanding  of  the 
software.  By  building  this  hierarchy,  the  programmer  is 
creating  an  internal  representation  of  the  program, 
independent  of  any  programming  language.    The  goal  or  ideal 
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Is  that,  at  any  level  cf  understanding ,  the  programmer 
should  be  aole  to  produce  a  functionally  equivalent  program 
in  any  language  that  he  or  she  is  familiar  with. 

The  title  cf  the  prograrr,  or  a  succinct  presentation  of 
the  task  for  which  the  software  was  written,  usually 
suggests  enougfc  information  for  the  programmer  to  generate  a 
hypothesis  about  the  general  flow  of  tbe  program.  This 
hypothesis  would  incorporate  expected  input  and  output  types 
with  a  corresponding  class  or  group  of  possible  data 
structures.  It  would  also  have  classes  of  algorithms  and 
abstract  logical  constructs  in  its  make-up,  with  the 
programmer  essentially  forrring  an  overview  of  how  the 
program  rright  worn.  Note  that  these  are  classes  and  not 
specific  elements. 

As  more  information  about  the  program  is  processed, 
these  ideas  are  refined  by  generating  other,  aore  specific- 
hypotheses  based  on  new,  mere  focused  expectations.  As 
rrentioned,  a  hierarchy  would  begin  to  fern-,  each  level 
further  refining  the  expectations  used  tc  generate  the 
hypotheses  above.  As  each  new  level  is  fcrred,  it 
incorporates  more  information  about  the  program.  Tbe  result 
is  more  factual  information  in  support  of  these  hypotheses, 
and  less  supposition  based  on  previous  knowledge  cf  sirilar 
tasks.  This  is  not  to  say  that  knowledge  base  information 
is  replaced  by  that  newly  learned  atcut  the  task.  Father, 
facts   about   tbe   problem   are   used   to  verify,   whenever 
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possible,  the  supposed  inf  orrra  ticn.  Only  when  a 
contradiction  occurs  is  this  information  replaced. 
Obviously,  this  process  is  dependent  on  the  programmer  's 
having  seen  sirrilar  problems  before.  It  seems  appropriate, 
therefore,  to  digress  for  a  n-orrent  to  address  this  idea  of 
sarreness  or  analogy. 

As  was  mentioned  before,  information  in  memory  is 
organized  into  groups  based  on  certain  parameters  or 
constraints.  Hew,  in  fact,  this  grouping  is  accomplished, 
is  still  not  understood,  however  it  does  occur.  As 
associations  are  virtually  limitless,  it  seems  logical  to 
assume  that  groupings  are  as  well.  Similar  problems  could 
therefore  be  grouped  and  an  abstract  set  of  circumstances 
formed  to  encompass  dominant  characteristics  of  the  group. 
This  idea  is  similar  to  that  of  a  frame.  Then,  as  problems 
are  introduced,  they  are  compered  against  these  dominant 
characteristics.  If  the  characteristics  match,  the  problem 
is  considered  analogous. 

As  this  matching  process  seems  a  mammoth  task  as 
presented,  consider  the  reduction  of  work  if  these  sets  of 
circumstances  were  grouped  by  single  characteristics, 
incorporating  confidence  levels,  or  another  netbod  of 
rating,  to  distinguish  most  from  least  dominant  in  the  set. 
This  would  cause  stronger  and  weaker  associations,  leading 
to  the  most  probable  set  first,  analogous  tc  an  electron 
following   the  path  of  least   resistance.   This   type  of 
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organization  fcould  greatly  reduce  the  amount  cf  searching 
necessary  to  identify  this  class  of  si tua ti ens  . 

The  benefits  of  these  analogies,  when  they  exist,  are 
taKen  advantage  of  in  generating  hypotheses.  As  stated 
earlier,  the  programmer  mattes  maximum  use  of  his  cr  her 
Knowledge  tase.  This  is  accorrpli  shed  by  relying  on 
previously  learned  inforrration  regarding  a  general  solution 
already  familiar  to  him  or  her.  In  this  case,  the  specifics 
of  the  software  solution  need  only  te  learned  if  and  wten 
they  are  needed  and  differ  from  these  of  the  general  one. 
This  is  a  much  reduced  task,  relative  to  learning  the  entire 
solution  (or  program)  when  no  such  analogies  exist  in  the 
knowledge  base. 

Returning  to  the  discussion  of  hypotheses,  the 
hierarchical  structure  can  he  explained  easily  by  once  again 
using  a  semantic  net  representation.  Each  hypothesis  can  he 
thought  of  as  a  frame.  Each  slot  walue  of  a  frame  would 
either  te  an  information  element  or  a  fraire  itself, 
obviously  more  specific  than  the  one  whose  slot  it  fills. 

Initially,  all  frames  (hypotheses)  would  contain  either 
default  or  normal  values.  As  more  information  is  processed 
regarding  the  software,  these  values  would  te  confirmed  or 
replaced.  These  new  values  could  te  frames,  representing 
still  more  specific  hypotheses.  Normal  values,  wren 
contradicted ,  are  replaced  by  exceptions  specific  to  the 
problem  at  hand. 
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Each  introduction  of  new  inf  orma  ti  en  causes  a 
reorganization  of  rrerrory  due  to  the  change  iD  context.  This 
reorganization  would  irake  use  of  confirmed  inf  orrration ,  old 
or  new,  and  Fay  cause  a  change  in  default  cr  norrral  values 
Dot  yet  verified.  If  this  change  in  context  cccurs  at  a  low 
level  of  the  hierarchy,  the  programmer's  perspective  will 
change  only  slightly.  If,  however,  the  change  affects  slot 
values  in  the  top  levels,  reorganization  of  a  large  subtree 
rrigbt  occur,  giving  the  prcgrarriTer  a  significantly  different 
view  of  the  problem.  The  view  could  also  change  if  the 
programmer  chooses  to  shift  bis  or  her  attention  from  the 
overall  view,  to  a  mere  refined  hypothesis,  fccuslng  then  en 
a  subtree  of  the  hierarchy.  This  would  have  the  effect  of 
emphasizing  the  details  contained  in  this  subtree  and 
'chunking'  the  remainder.  The  hypothesis  hierarchy  is 
therefore  dynamic,  changing  with  every  shift  in  context. 

Verificaticn  can  taKe  place  at  any  time.  It  usually 
occurs  when  the  programmer  reaches  a  level  cf  urderstanding 
abcut  the  behavior  cf  the  program  that  he  cr  she  wishes  to 
corfirm.  This  can  be  because  the  programmer  has  reached  a 
level  of  understanding  believed  adequate  for  the  task  he  or 
she  needs  to  perform,  cr  it  might  sirrply  be  to  validate 
certain  hypotheses  before  continuing.  One  reason  for 
intermediate  validation  is  that  it  lessens  the  effects  of 
discovering  an  invalid  hypothesis  or  contradiction. 
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For  verification ,  the  hypotheses  forming  the  leaves  of 
the  tree  are  tested,  against  the  code.  Tvc  conditions  erF 
necessary  for  verification  cf  the  hierarchy.  First,  cede 
corresponding  to  the  hypothesis  tein^  verified  rust  be  in 
the  program.  Second,  all  cede  rrust  ce  accounted  fcr  by  cne 
of  the  hypotheses.  If  either  of  these  conditions  fails,  the 
structure  is  reorganized  to  reflect  this  and  aty  new 
inf or  mat ion  gained  from  it. 
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V.   SCENARIO 

A  scenario  is  now  presented  to  help  exemplify  bow  each 
process  applies  to  the  task  of  program  comprehension.  It  is 
meant  to  give  the  reader  an  intuitive  understanding  of 
application  and  effects,  as  *ell  as  the  mechanisms 
underlying  these  cognitive  processes.  The  reader  should 
also  gain  an  understanding  of  the  interrelationships  between 
the  processes,  the  knowledge  case,  and  information  relating 
specifically  to  the  prcgrarr.  It  is  the  collective  use  of 
these  whict  gives  the  expert  his  or  her  superior  skills, 
for  simplicity,  a  structured  program  is  assumed  as  well  as 
an  ALGCL-liKe  programming  language.  Agair,  serrantic  nets 
are  used  to  represent  memory  organization. 

The  program  used  for  this  scenario  will  te  one  which 
computes  averages  cf  student  grades  and  outputs  a  letter 
grade  for  each.  It  is  a  fairly  structured  program  with 
adequate  documentation  and  uses  mnemonics  but  no  comments  in 
the  source  code. 
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A.   A  WAIK-1HECUGH 

Suppose  a  programmer  is  given  a  program  that  be  cr  ste 
has  never  seen  tei'cre  ana  asked  to  perform  scrre  modification 
to  it.  lurther  suppose  that  to  do  this  p edification ,  an 
overall  understanding  of  the  prcgrarr  is  necessary.  He  or 
she  most   liKely  begins  by  locking   at   the   decumentat Ion . 

After  reading  a  srrali  part  of  the  documerjtat  ion ,  perhaps 
a  phrase  cr  sentence,  the  programmer  forms  a  hypothesis,  he 
or  she  tas  assertained  that  the  program  averages  student 
grades.  This  defines  a  context,  and  a  recrganizat  len  of 
merrory  takes  place.  This  reorganization  results  in  a  large 
iEfcrmaticn  cluster,  forming  a  frame.  It  contains  slots 
such  as  INPUT  LATA,  OUTPUT  DATA,  and  PECCESSIS. 

The  value  of  the  INPUT  EATA  slot,  based  en  the 
programmer's  knowledge  of  how  school  grades  are  arrived  at, 
is  a  cluster  of  possible  types  or  classes  cf  data.  These 
would  include,  at  this  level,  every  type  of  data  in  his  or 
her  knowldege  tase  that  the  programmer  associates  with 
school  grades,  as  well  as  all  possible  data  structures 
associated  with  them.  The  values  of  the  ether  slots  would 
be  of  a  similar  nature. 

So  by  simply  reading  a  single  phrase,  'computes  student 
grade  averages',  the  programmer  has  constructed  ac  internal 
representation  of  the  program.  Be  or  she  expects  that  it 
taKes  some  input  data,  processes  this  data,  and  cutputs  the 
result.    In   addition,   he  or  she  has  identified   an   input 
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domain,  an  output  domain,  and  a  dorrain  of  algorithms  on 
which  the  processing  of  the  data  is  assumed  based.  While 
this  is  certainly  not  specific  enough  a  representation  of 
the  software  to  enable  the  programmer  to  do  any  useful  work, 
a  level  of  understanding  has  been  achieved. 

Further  reading  of  the  documentation  reveals  that  each 
student's  grades  will  he  read  in,  summed,  and  the  average 
converted  to  a  letter  grade  and  stored.  This  information 
suggests  many,  more  specific,  data  and  algorithmic  classes, 
and  several  levels  of  hypotheses  are  formulated.  Presuming 
that,  at  this  point,  the  programmer  tegins  to  develop 
hypotheses  in  a  quasi  depth-first  order,  focusing  en  input, 
one  hypothesis  would  he  that  grades  are  read  in  as  numbers. 
Another  might  be  that  each  student's  identification  is  input 
in  conjunction  with  his  or  her  grades.  The  grade  data 
hypothesis  is  then  refined,  forming  a  lower  level  hypothesis 
that  grades  will  be  represented  as  integers  and  handled  as  a 
list.  Note  that  at  this  point,  the  programmer  is  not 
interested  in  what  representation  is  used  for  student 
identification,  possibly  because  hypotheses  about  the 
processing  of  the  data  suggest  that  the  identification  data 
will  be  used  but  not  altered,  so  specific  typing  will  not  be 
necessary. 

In  memory,  each  hypothesis  is  represented  as  a  frame 
with  ordered  slots.  This  ordering,  if  relevant,  is  based  on 
the  expected  or  confirmed  ordering  of   the   representative 
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information  in  the  program,  otherwise  it  Is  arbitrary.  Jor 
example,  the  ordering  of  algorithms  would  te  irportant  in 
understanding  the  program,  whereas  the  ordering  of  data 
classes  in  the  frames  created  frorr  the  input  hypotheses,  tor 
exarrple  the  one  representing  the  hypothesis  that  both  grades 
ard  student  identification  are  input,  is  net  important  for 
prograrr  understanding.  If  subsequent  analysis  reveals  that 
a  specific  ordering  is  necessary,  the  frame  would  be 
reorganized  to  reflect  this,  because  of  the  new  context. 

The  value  of  each  slot  is  an  information  cluster 
representing  a  knowledge  domain,  as  frames  representing 
hypotheses  use  classes  of  information  and  not  specific 
elements.  The  cluster  is  formed  based  en  the  context 
defined  by  the  hypothesis  which  the  frame  or  slot 
represents.  The  initial  hypothesis'  INPUT  slot  has,  as  a 
value,  a  cluster  representing  all  data  types  or  classes  that 
the  programmer  associates  with  grades.  When  the  subsequent 
hypotheses  are  forrred,  defining  the  input  as  STUDENT  IDENT 
and  GRADE,  this  cluster  is  reorganized  into  a  two  slot 
frame,  each  representing  a  sub-cluster  of  the  original.  The 
value  of  the  STUDENT  IDENT  slot  becomes  all  possible 
representations  by  which  students  can  be  identified,  and  the 
value  of  the  GRADE  slot  becomes  the  cluster  of  all  possible 
classes  of  grade  representation  contained  in  the  knowledge 
base.  Any  elements  or  nodes  of  the  original  cluster  not 
associated  with   either   of   these  new  clusters   is   net 
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'visible'  frotr  this  frarre  down,  sirrilar  to  the  idea  of 
scoping  in  sorre  prcgramning  languages.  So  on  one  level, 
there  is  a  single  cluster  representing  the  hypothesis  as  a 
grouping  of  all  possible  input  data  classes,  while  on 
another  level,  this  same  information,  or  a  subset  of  it,  is 
viewed  as  two  separate  clusters.  This  reorganization  of 
information  occurs  because  of  the  change  in  context  when  the 
subsidiary  hypotheses  are  introduced. 

The  programmer  has  now  increased  his  or  her 
understanding  of  the  program.  In  addition  to  what  was 
expected  based  on  the  original  hypothesis,  the  programmer 
now  also  expects  that: 

-  grades  are  numerical 

-  each  student's  set  of  grades  is  processed  separately 

-  the  grades  are  initially  input  into  a  list  structure 

-  the  grades  are  summed  and  averaged 

-  each  student  is  identified  with  his  or  her  grades 

-  a  mapping  takes  place  from  average  to  letter 

-  student  IE  and  corresponding  letter  grade  is  stored 
ligure  €  shows  this  representation  focusing  on  the  input 
subtree  of  the  hypothesis  hierarchy.  lach  level  can  be 
thought  of  as  a  level  of  understanding.  It  should  be  noted 
that,  at  this  point,  no  verification  has  taken  place  and 
this  level  of  understanding  is  contingent  on  the  correctness 
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of  the  hypotheses  formed.  However,  this  understanding  is 
not  appreclatly  diminished  unless  the  erroneous  hypothesis 
is  located  in  a  top  level  of  the  hierarchy. 

Continuing  to  focus  on  input,  in  order  to  verify  this 
representation  the  prograrrrer  needs  to  slice  the  source  code 
using  input  cehavlor  as  the  criterion.  Then,  each  line  of 
code  in  the  slice  must  be  mapped  to  a-  leaf-fiarre  or  slot  of 
the  input  suhtree.  Note  that  these  leaf-frames  or  slots  do 
not  all  have  to  te  on  the  same  level. 


TAKES  INPUT 
PROCESSES  AND 
OUTPUTS  RESULTS 


INPUT  DATA 
ASSOCIATED  WITH 
STUDENT  GRADES 


STUDENT 
ID 
DATA 


PROGRAM 

AVERAGES 

GRADES 


STUDENT 
GRADES 


LIST 

CLASS 

D.S. 


INTEGER 

CLASS 

D.S. 


ligure  £  -  Memory  Representation  of  Program  (Input) 
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Assume  the  following  is  the  result  of  the  slicing 
process: 

REAE  STUEENT 

REPEAT 

1  =  1  +  1 

REAE  STUE_GRAEEU] 
UNTIL  STUE_GRAEE[I]  =  999 

The  programmer  now  attempts  to  verify  the  hypotheses  against 
the  code.  The  READ  STUDENT  line  stands  alone  as 
verification  of  the  hypothesis  that  each  student  is  input. 
To  verify  the  two  hypotheses  associated  witb  grades  is 
slightly  rrore  complicated.  The  REAE  STUE_GRAEE  [I]  statement 
would  he  adequate  to  verify  the  hypothesis  that  student 
grades  were  input.  However,  it  fails  to  confirm  that  it  is 
a  numerical  representation.  To  confirm  this,  if  no 
declaration  statement  exists,  the  programmer  must  analyse 
the  behavior  of  the  variable.  The  code  resulting  from  the 
slicing  process  based  on  input  is  itself  sliced,  this  time 
on  STUE_GRADE[I] .  The  UNTIL  STUE_GRAEE [I]  =  999  statement 
becorres  tfce  only  other  line  in  the  slice. 

The  programmer  recognizes  the  UNTIL  statement  as  a 
corrpare  and  branch  operation  and  notes  that  the  variable  is 
corrpared  to  a  number.  His  or  ter  knowledge  of  the 
programming  language  is  extensive  enough  to  realize  that  999 
must  be  a  number  and  net  a  string.    Also,   he  or  she   knows 


that  if  a  number  is  con-pared  to  anything  tut  another  number, 
a  'type  mismatch'  occurs.  Therefore,  STUD_GRADE  [I]  rust  be 
a  Duirber.   This  verifies  the  first  slot  of  the  frame. 

The  REPEAT-UNTIL  block  of  the  original  slice  is 
recognized  as  a  looping  construct.  This,  ccupled  with  the 
fact  that  one  variable  inside  the  loop  is  used  as  an  indei, 
allows  the  prcgraTmer  to  chunk  the  block  as  "BO III  AN 
ARRAY".  This  chunk  is  associated  with  tbe  grade  input  and, 
based  on  this  context,  the  information  cluster  associated 
with  the  grade  data  structure  is  processed.  It  is  fouod  to 
include  the  class  of  array  data  structures,  and  so  the 
second  slot  and  its  corresponding  hypothesis  is  also 
verified.  With  all  code  new  mapped,  the  entire  input 
representation  is  considered  verified,  as  all  higher  level 
hypotheses  inherit  the  verification.  Alsc,  with  reference 
to  the  last  verif icaticn ,  it  should  he  noted  that  the 
information  cluster  and  hypothesis  were  further  refined  to 
reflect  that  a  particular  class,  the  array  class,  of  list 
structures  v«as  used. 

If  a  contradiction  does  occur  in  verification,  a  wain  up 
the  subtree  takes  place.  Each  hypothesis  is  checked  until 
one  is  found  which  the  information  does  not  contradict.  A 
new  hypothesis  is  formed  at  the  next  lower  level  as  a 
refinement  of  this  hypothesis,  and  all  hypotheses  below  this 
level  are  reevaluated  based  on  the  new  context.  A  similar 
process   takes   place   if   information,    other   than    that 
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expected,  Is  found  and  needs  to  be  included  in  the 
representation.  Obviously,  the  higher  up  the  tree  the 
change  takes  place,  the  greater  the  memory  reorganization 
Decessary . 

Up  to  this  point,  the  programmer  has  teen  forming  the 
program  representation  using  a  top-down  approach.  However, 
there  are  times  when  a  bottom-up  inductive  approach  is  also 
necessary.  Usually  this  approach  is  taken  when  a 
programmer's  knowledge  base,  regarding  the  task  domain,  is 
incomplete,  or  when  atypical  algorithms  are  used.  Here  is 
where  chunking  plays  a  major  role.  The  purpose  of  this  next 
example  is  to  demonstrate  this  role,  and  not  to  describe,  in 
detail  the  inductive  process. 

Suppose  the  programmer  is  confronted  with  a  rrodule  or 
block  of  code  that  he  or  she  has  formed  no  hypothesis  abcut 
at  a  specific  level.  Using  the  grade  averaging  exarrple, 
assume  that  the  programmer  has  no  knowledge  of  how  averages 
are  computed,  and  that  the  algorithm  used  is  unknown  to  him 
or  her.  The  programmer  now  tries  to  understand  the 
algorithm  by  inductively  reasoning  abcut  the  code  based  on 
his  or  her  Knowledge  of  lower  level  functions  performed 
within  it. 

At  the  lowest  level,  this  is  accomplished  by  locking  at 
individual  lines  of  code  and  assigning  therr  interpretations 
[Bef.  12] .  However,  because  the  expert's  knowledge  base 
contains    information   about   constructs   and   their   uses, 
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certain  of  these  lines  are  recognized  as  cede  included  in 
the  performance  of  a  specific  function.  ERCCKS  cells  these 
' Deacons  ' . 

The  block  of  code  is  for  a  standard  averaging  routine: 

I  =  1 

sup  =  e 

WHILE  STUI.GBAEE  [I]  <  >  999  TO 

SUP  =  SUM  +  STUD  GRADEIIJ 
1*1  +  1 

ENE_WKIIE 

AVERAGE  =  SUP  /  I 

The  programmer  analysing  this  cede  recognizes  the  first  twe 
lines  as  assignment  statements,  and  interprets  them 
individually.  He  or  she  do*  looks  at  the  WHITE  line  and 
recognizes  it  as  a  looping  construct  and  ceecon  for  several 
functional  uses.  The  next  assignment  statement  has  the 
assignment  variatle  on  both  sides  of  the  equal  sign,  and  so 
is  interpreted  as  changing  the  value  of  SUP  by  performing 
sore  operation  on  it,  rather  than  simply  assigning  it  a 
value.  Cnce  the  value  added  is  recognized  as  ac  indexed 
value,  the  programmer  chunks  the  loop.  He  or  she  has 
Knowledge  base  information  which  shows  that  an  indexed 
variable  added  to  that  type  of  assignment  statement 
indicates  an  array  summation  process.  So  these  four  lines 
are  chunked  as  "SUP  STUDENT  GRADES".  Also,  the  first  two 
lines   are  now  chunked  as  "VAEIAELI  INITIALIZATION"  based  on 
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this  new  information.  The  last  line  is  iDterpreted  as  an 
assignment  statement  which  computes  the  grade  average  by 
dividing  the  sum  of  the  grades  by  the  number  of  grades 
summed . 

By  chunking,  the  programmer  has  taken  a  piece  of  code, 
which  could  te  considered  a  single  chunk  which  "COMPUTES 
GRADE  AVERAGES",  and  formed  a  representation  through 
inductive  reasoning.  The  original  seven  lines  of  code  can 
now  be  interpreted  as: 

-  Initialize  variables 

-  Sum  grades 

-  Divide  sum  by  number  of  grades  summed 

This  representation  can  stay  in  short  term  memory  to  be  used 
for  the  present  task,  being  linked  tc  the  representation  of 
the  rest  of  the  program  in  long  term  memory,  and/or  can  he 
used  to  learn  an  averaging  algorlttm  which  could  then  be 
used  for  other  tasks  as  well.  And,  once  learned,  the 
representation  could  be  added  to  that  in  long  tprir  memory. 
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VI.   RECOMMENDATIONS 

This  study  has  presented  a  theoretical  rrodel  of  sirrple 
cognitive  processes  developed  and  used  by  programmers . 
Further,  tfce  study  has  attempted  to  demonstrate  new  the 
expert,  hy  using  these  processes,  gains  an  in-depth 
understanding  of  complex  programs.  It  is  unrealistic,  at 
present,  to  fully  test  these  ideas  because  iretbodologles 
have  net  been  developed  in  the  behavioral  sciences  to  do 
this.  Also,  the  requisite  size  and  complexity  of  the 
programs,  and  the  time  involved,  are  prohibitive.  Research 
and  the  results  of  limited  testing  on  small  scale  programs, 
hcviever,  do  suggest  certain  design  techniques,  and  coding 
and  documentation  methods  which  directly  influence  the 
effectiveness  of  these  processes. 

Cne  such  area  is  code  structure,  which  should  be 
designed  so  as  to  suggest  chunks  to  anyone  attempting  to 
comprehend  it  [Ref.  13:  pg.  175].  Junctional  elements  of 
the  code  should  be  implemented  as  contiguous  blocks  of  text 
whenever  possible.  Arbitrary  GCIC's  and  forward  and 
backward  JUPPs  should  be  avoided.  Control  flew  statements 
should  be  used  to  direct  flew  from  the  exit  point  of  cne 
chunk  to  the  entry  point  of  others.  All  these 
considerations  enhance  the  chunking  process  ty  making  blocks 
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of  code  recognizable  as  single  functions.  This  results  in 
making  it  easier  to  use  the  text  of  the  program  as  an 
external  memory  fcr  those  chunks. 

Tests  conducted  by  WIISIR  also  indicated  that  code 
structure  influences  slicing  [Ref.  15].  It  was  found  that  a 
iruch  higher  degree  of  slicing,  among  21  expert  programmers , 
took  place  wben  analysing  a  poorly  structured  prograrr  with 
indiscriminate  use  of  GOTC's  and  non-mnemonic  variable  names 
than  when  analysing  programs  whict  make  use  of  rrodular 
designs,  mnemonics,  and  comments.  The  value  of  proper  use 
of  rrnerronics  and  comments  to  the  slicing  process  is  that 
they  serve  to  explicitly  show  data  flow  and  to  group 
associated  statements  and  functions.  This  lessens  the  need 
for  programmers  to  ferret  out  this  information.  One  can 
conclude  that  less  effort  was  required  to  achieve  an  equal 
level  of  understanding  when  good  programming  techniques  were 
employed.  The  use  of  these  maximizes  the  effectiveness  of 
slicing  while  minimizing  the  effort  necessary. 

Comments  and  mnemonics  are  also  helpful  to  the  chunking 
process.  A  well  placed  comment,  specifying  the  purpose  of  a 
block  of  code,  and  perhaps  the  data  elements  affected, 
explicitly  identifies  a  functional  chunK.  This  chunk  could 
then  easily  be  encoded  based  on  the  comment  alone, 
eliminating  the  need  for  code  analysis  at  that  point. 
Meaningful  mnemonics  would  give  seme  insight  into  their 
purpose   and  thus  both  aid  the  recognition  and   chunking   of 
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corrplex  data  structures  ana  help  to  forn  correct 
hypotheses.  These  could  then  be  Incorporated  Into  still 
larger  chunKs,  allowing  the  many  date  elements  which  rrake  up 
the  structure  to  be  processed  as  a  single  element  In  rrerrory. 

Prograrr  documentation  can  be,  itself,  a  wealth  of 
information  for  the  expert  programmer .  A  natural  language 
explanation  of  the  approach  tauen  in  originally  designing 
the  software  facilitates  the  forrrulation  of  a  fairly 
accurate  hypothesis  regarding  its  implementation .  Citing 
explicitly  the  algorithms  employed  enables  verification  of 
certain  hypotheses  without  extensive  code  analysis.  Using 
this  information,  the  maintainer  can  more  easily  focus  en 
certain  functions  or  hehaviors  of  the  code  without  having  to 
first  analyse  it  in  depth  to  determine  the  specifics  of  its 
implementation.  If  exceptions  to  standard  algorithmic 
coding  are  noted,  it  saves  the  programmer  from  having  to 
determine  why  it  was  coded  in  such  a  way.  Also,  if  subtle 
effects  of  the  code  are  included  in  the  documentation,  along 
with  certain  potentials  for  side  effects,  it  would  reduce 
the  testing  necessary  when  a  modification  is  made. 

One  final  area  which  positively  affects  the  use  of  these 
processes  is  standardization  on  all  levels.  Use  of  a 
standard  design  methodology  would  allow  programmers  to  learn 
how  to  best  chunK  and  slice  certain  representative  software 
formats.    'Beacons'   identifying  certain   functional   areas 
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could  be  learned  and  used  effectively.  Automatic  tools  to 
aid  these  processes  could  also  be  developed  with  less 
difficulty. 

On  a  rrcre  specific  level,  standardization  cf  algorithms, 
and  their  correspocding  constructs  would  greatly  sirrplify 
the  task  of  comprehension.  Experts  would  te  able  to 
incorporate  these  into  their  knowldege  hases,  learning  therr, 
from  fcoth  the  functional  and  the  behavioral  points  of  view. 
Also,  coding  templates  could  he  learned  and  associated  with 
these,  aiding  recognition  of  code  itself. 

Similar  ideas  have  teen  used  in  most  other  engineering 
fields  with  great  success.  While  software  engineering  is 
not,  in  many  respects,  as  rigorous  as  these  other 
disciplines,  standards  could  be  made  flexible  enough  so  as 
not  to  inhibit  progress.  Software  reuseatility  is  the 
motivation  for  recently  generated  interest  in  this  area. 
The  programming  language  ADA  is  the  first  step  in  an  attempt 
at  achieving  some  of  this  standardization,  and  its  use  in 
conjunction  with  these  processes  may  serve  to  verify  their 
validity. 
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